24 research outputs found

    Distributionally Robust Learning with Weakly Convex Losses: Convergence Rates and Finite-Sample Guarantees

    Full text link
    We consider a distributionally robust stochastic optimization problem and formulate it as a stochastic two-level composition optimization problem with the use of the mean--semideviation risk measure. In this setting, we consider a single time-scale algorithm, involving two versions of the inner function value tracking: linearized tracking of a continuously differentiable loss function, and SPIDER tracking of a weakly convex loss function. We adopt the norm of the gradient of the Moreau envelope as our measure of stationarity and show that the sample complexity of O(ε3)\mathcal{O}(\varepsilon^{-3}) is possible in both cases, with only the constant larger in the second case. Finally, we demonstrate the performance of our algorithm with a robust learning example and a weakly convex, non-smooth regression example

    Robust Accelerated Primal-Dual Methods for Computing Saddle Points

    Full text link
    We consider strongly convex/strongly concave saddle point problems assuming we have access to unbiased stochastic estimates of the gradients. We propose a stochastic accelerated primal-dual (SAPD) algorithm and show that SAPD iterate sequence, generated using constant primal-dual step sizes, linearly converges to a neighborhood of the unique saddle point, where the size of the neighborhood is determined by the asymptotic variance of the iterates. Interpreting the asymptotic variance as a measure of robustness to gradient noise, we obtain explicit characterizations of robustness in terms of SAPD parameters and problem constants. Based on these characterizations, we develop computationally tractable techniques for optimizing the SAPD parameters, i.e., the primal and dual step sizes, and the momentum parameter, to achieve a desired trade-off between the convergence rate and robustness on the Pareto curve. This allows SAPD to enjoy fast convergence properties while being robust to noise as an accelerated method. We also show that SAPD admits convergence guarantees for the gap metric with a variance term optimal up to a logarithmic factor --which can be removed by employing a restarting strategy. Furthermore, to our knowledge, our work is the first one showing an iteration complexity result for the gap function on smooth SCSC problems without the bounded domain assumption. Finally, we illustrate the efficiency of our approach on distributionally robust logistic regression problems

    High Probability and Risk-Averse Guarantees for a Stochastic Accelerated Primal-Dual Method

    Full text link
    We consider stochastic strongly-convex-strongly-concave (SCSC) saddle point (SP) problems which frequently arise in applications ranging from distributionally robust learning to game theory and fairness in machine learning. We focus on the recently developed stochastic accelerated primal-dual algorithm (SAPD), which admits optimal complexity in several settings as an accelerated algorithm. We provide high probability guarantees for convergence to a neighborhood of the saddle point that reflects accelerated convergence behavior. We also provide an analytical formula for the limiting covariance matrix of the iterates for a class of stochastic SCSC quadratic problems where the gradient noise is additive and Gaussian. This allows us to develop lower bounds for this class of quadratic problems which show that our analysis is tight in terms of the high probability bound dependency to the parameters. We also provide a risk-averse convergence analysis characterizing the ``Conditional Value at Risk'', the ``Entropic Value at Risk'', and the χ2\chi^2-divergence of the distance to the saddle point, highlighting the trade-offs between the bias and the risk associated with an approximate solution obtained by terminating the algorithm at any iteration

    Convergence Rates of Stochastic Gradient Descent under Infinite Noise Variance

    Full text link
    Recent studies have provided both empirical and theoretical evidence illustrating that heavy tails can emerge in stochastic gradient descent (SGD) in various scenarios. Such heavy tails potentially result in iterates with diverging variance, which hinders the use of conventional convergence analysis techniques that rely on the existence of the second-order moments. In this paper, we provide convergence guarantees for SGD under a state-dependent and heavy-tailed noise with a potentially infinite variance, for a class of strongly convex objectives. In the case where the pp-th moment of the noise exists for some p[1,2)p\in [1,2), we first identify a condition on the Hessian, coined 'pp-positive (semi-)definiteness', that leads to an interesting interpolation between positive semi-definite matrices (p=2p=2) and diagonally dominant matrices with non-negative diagonal entries (p=1p=1). Under this condition, we then provide a convergence rate for the distance to the global optimum in LpL^p. Furthermore, we provide a generalized central limit theorem, which shows that the properly scaled Polyak-Ruppert averaging converges weakly to a multivariate α\alpha-stable random vector. Our results indicate that even under heavy-tailed noise with infinite variance, SGD can converge to the global optimum without necessitating any modification neither to the loss function or to the algorithm itself, as typically required in robust statistics. We demonstrate the implications of our results to applications such as linear regression and generalized linear models subject to heavy-tailed data
    corecore